Language Recognition for Mono- and Multi-lingual Documents
ثبت نشده
چکیده
In this paper we describe language recognition algorithms for monoand multi-lingual documents that are based on mixed-order n-grams, Markov chains, maximum likelihood, and dynamic programming. We compare the monolingual algorithm to those suggested by other researchers. This comparison suggests that this algorithm significantly outperforms commonly used language recognition algorithms. We then describe the multilingual algorithm, which allows for segmenting a multilingual document into single language chunks and identifying the languages of those chunks.
منابع مشابه
Approaching Multi-Lingual Emotion Recognition from Speech - On Language Dependency of Acoustic/Prosodic Features for Anger Recognition
In this paper, we describe experiments on automatic Emotion Recognition using comparable speech corpora collected from real-life American English and German Interactive Voice Response systems. We compute the optimal set of acoustic and prosodic features for mono-, crossand multi-lingual anger recognition, and analyze the differences. When an emotion recognition system is confronted with a langu...
متن کاملA Comparison in Reading Ability and Achievement between Mono-Lingual and Bilingual Fifth Graders
A Comparison in Reading Ability and Achievement between Mono-Lingual and Bilingual Fifth Graders Y. Adib, Ph.D. Z. Sharifi N. Mahmoodi To compare both the reading ability and academic achievement among Farsi speaking mono-lingual fifth graders and their bilingual Aazari and Kordi counterparts, three samples of 153, 132, and 145 (total 430) such students from three cities o...
متن کاملN-Gram Language Modeling for Robust Multi-Lingual Document Classification
Statistical n-gram language modeling is used in many domains like speech recognition, language identification, machine translation, character recognition and topic classification. Most language modeling approaches work on n-grams of terms. This paper reports about ongoing research in the MEMPHIS project which employs models based on character-level n-grams instead of term n-grams. The models ar...
متن کاملIntegrating Query Translation and Text Classification in a Cross-Language Patent Access System
In this paper, a cross-language patent retrieval and classification system is presented to integrate the query translation using various free web translators on the internet and the document classification. The language-independent indexing method was used to process the multilingual patent documents, and the query translation method was used to translate the query from the source language to t...
متن کاملSequence-based Multi-lingual Low Resource Speech Recognition
Techniques for multi-lingual and cross-lingual speech recognition can help in low resource scenarios, to bootstrap systems and enable analysis of new languages and domains. End-to-end approaches, in particular sequence-based techniques, are attractive because of their simplicity and elegance. While it is possible to integrate traditional multi-lingual bottleneck feature extractors as front-ends...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007